Variable Selection for General Index Models via Sliced Inverse Regression
نویسندگان
چکیده
Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential variables under the general index model, in which the response is dependent of predictors through an unknown function of one or more linear combinations of them. Instead of building a predictive model of the response given combinations of predictors, we model the conditional distribution of predictors given the response. This inverse modeling perspective motivates us to propose a stepwise procedure based on likelihood-ratio tests, which is effective and computationally efficient in identifying important variables without specifying a parametric relationship between predictors and the response. For example, the proposed procedure is able to detect variables with pairwise, three-way or even higher-order interactions among p predictors with a computational time of O(p) instead of O(p) (with k being the highest order of interactions). Its excellent empirical performance in comparison with existing methods is demonstrated through simulation studies as well as real data examples. Consistency of the variable selection procedure when both the number of predictors and the sample size go to infinity is established.
منابع مشابه
Boiling Points Predictions Study via Dimension Reduction Methods: SIR, PCR and PLSR
Variable selection is an important tool in QSAR. In this article, we employ three known techniques: sliced inverse regression (SIR), principal components regression (PCR) and partial least squares regression (PLSR) for models to predict the boiling points of 530 saturated hydrocarbons. With 122 topological indices as input variables our results show that these three methods have good performanc...
متن کاملForward Selection and Estimation in High Dimensional Single Index Models
We propose a new variable selection and estimation technique for high dimensional single index models with unknown monotone smooth link function. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection. In this article, we propose a ...
متن کاملSliced Inverse Regression with Variable Selection and Interaction Detection
Variable selection methods play important roles in modeling high dimensional data and are keys to data-driven scientific discoveries. In this paper, we consider the problem of variable selection with interaction detection under the sliced inverse index modeling framework, in which the response is influenced by predictors through an unknown function of both linear combinations of predictors and ...
متن کاملMisspecification and Heterogeneity in Single-Index, Binary Choice Models
We propose a nonparametric approach for estimating single-index, binarychoice models when parametric models such as Probit and Logit are potentially misspecified. The new approach involves two steps: first, we estimate index coefficients using sliced inverse regression without specifying a parametric probability function a priori; second, we estimate the unknown probability function using kerne...
متن کاملA note on shrinkage sliced inverse regression
We employ Lasso shrinkage within the context of sufficient dimension reduction to obtain a shrinkage sliced inverse regression estimator, which provides easier interpretations and better prediction accuracy without assuming a parametric model. The shrinkage sliced inverse regression approach can be employed for both single-index and multiple-index models. Simulation studies suggest that the new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014